AITopics | Akita

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

Jiang, Dongzhi, Zhang, Renrui, Guo, Ziyu, Wu, Yanmin, Lei, Jiayi, Qiu, Pengshuo, Lu, Pan, Chen, Zehui, Song, Guanglu, Gao, Peng, Liu, Yu, Li, Chunyuan, Li, Hongsheng

arXiv.org Artificial IntelligenceSep-19-2024

The advent of Large Language Models (LLMs) has paved the way for AI search engines, e.g., SearchGPT, showcasing a new paradigm in human-internet interaction. However, most current AI search engines are limited to text-only settings, neglecting the multimodal user queries and the text-image interleaved nature of website information. Recently, Large Multimodal Models (LMMs) have made impressive strides. Yet, whether they can function as AI search engines remains under-explored, leaving the potential of LMMs in multimodal search an open question. To this end, we first design a delicate pipeline, MMSearch-Engine, to empower any LMMs with multimodal search capabilities. On top of this, we introduce MMSearch, a comprehensive evaluation benchmark to assess the multimodal search performance of LMMs. The curated dataset contains 300 manually collected instances spanning 14 subfields, which involves no overlap with the current LMMs' training data, ensuring the correct answer can only be obtained within searching. By using MMSearch-Engine, the LMMs are evaluated by performing three individual tasks (requery, rerank, and summarization), and one challenging end-to-end task with a complete searching process. We conduct extensive experiments on closed-source and open-source LMMs. Among all tested models, GPT-4o with MMSearch-Engine achieves the best results, which surpasses the commercial product, Perplexity Pro, in the end-to-end task, demonstrating the effectiveness of our proposed pipeline. We further present error analysis to unveil current LMMs still struggle to fully grasp the multimodal search tasks, and conduct ablation study to indicate the potential of scaling test-time computation for AI search engine. We hope MMSearch may provide unique insights to guide the future development of multimodal AI search engine. Project Page: https://mmsearch.github.io

akita museum, arxiv preprint arxiv, information, (13 more...)

arXiv.org Artificial Intelligence

2409.12959

Country:

North America > United States > New York (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(23 more...)

Genre: Research Report (0.81)

Industry:

Media (1.00)
Government (0.94)
Leisure & Entertainment > Games (0.94)
(5 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals

Nakatani, Tomohiro, Miyoshi, Masato, Kinoshita, Keisuke

Neural Information Processing SystemsDec-31-2004

Speech dereverberation is desirable with a view to achieving, for example, robust speech recognition in the real world. However, it is still a challenging problem, especially when using a single microphone. Although blind equalization techniques have been exploited, they cannot deal with speech signals appropriately because their assumptions are not satisfied by speech signals. We propose a new dereverberation principle based on an inherent property of speech signals, namely quasi-periodicity. The present methods learn the dereverberation filter from a lot of speech data with no prior knowledge of the data, and can achieve high quality speech dereverberation especially when the reverberation time is long.

dereverberation operator, impulse response, speech signal, (13 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Tōhoku > Akita Prefecture > Akita (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.49)

Add feedback

One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals

Nakatani, Tomohiro, Miyoshi, Masato, Kinoshita, Keisuke

Neural Information Processing SystemsDec-31-2004

Speech dereverberation is desirable with a view to achieving, for example, robust speech recognition in the real world. However, it is still a challenging problem, especially when using a single microphone. Although blind equalization techniques have been exploited, they cannot deal with speech signals appropriately because their assumptions are not satisfied by speech signals. We propose a new dereverberation principle based on an inherent property of speech signals, namely quasi-periodicity. The present methods learn the dereverberation filter from a lot of speech data with no prior knowledge of the data, and can achieve high quality speech dereverberation especially when the reverberation time is long.

dereverberation operator, impulse response, speech signal, (13 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Tōhoku > Akita Prefecture > Akita (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.49)

Add feedback

Filters

Collaborating Authors

Akita

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals

One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals